Crate sscanf[−][src]
Expand description
A sscanf (inverse of format!()) Macro based on Regex
sscanf
sscanf
is originally a C-function that takes a String, a format String with placeholders and
several Variables (in the Rust version replaced with Types). It then parses the input String,
writing the values behind the placeholders into the Variables (Rust: returns a Tuple). sscanf
can be thought of as reversing a call to format!()
:
// format: takes format string and values, returns String
let s = format!("Hello {}{}!", "World", 5);
assert_eq!(s, "Hello World5!");
// scanf: takes String, format string and types, returns Tuple
let parsed = sscanf::scanf!(s, "Hello {}{}!", String, usize);
// parsed is Option<(String, usize)>
assert_eq!(parsed, Some((String::from("World"), 5)));
scanf!()
takes a format String like format!()
, but doesn’t write
the values into the placeholders ({}
), but extracts the values at those {}
into the return
Tuple.
If matching the format string failed, None
is returned:
let s = "Text that doesn't match the format string";
let parsed = scanf!(s, "Hello {}_{}!", String, usize);
assert_eq!(parsed, None); // No match possible
Note that the original C-function and this Crate are called sscanf, which is the technically
correct version in this context. scanf
(with one s
) is a similar C-function that reads a
console input instead of taking a String parameter. The macro itself is called scanf!()
because that is shorter, can be pronounced without sounding too weird and nobody uses the stdin
version anyway.
More examples of the capabilities of scanf
:
let input = "<x=3, y=-6, z=6>";
let parsed = scanf!(input, "<x={}, y={}, z={}>", i32, i32, i32);
assert_eq!(parsed, Some((3, -6, 6)));
let input = "Move to N36E21";
let parsed = scanf!(input, "Move to {}{}{}{}", char, usize, char, usize);
assert_eq!(parsed, Some(('N', 36, 'E', 21)));
let input = "Escape literal { } as {{ and }}";
let parsed = scanf!(input, "Escape literal {{ }} as {{{{ and }}}}");
assert_eq!(parsed, Some(()));
let input = "A Sentence with Spaces. Another Sentence.";
let parsed = scanf!(input, "{}. {}.", String, String);
let (a, b) = parsed.unwrap();
assert_eq!(a, "A Sentence with Spaces");
assert_eq!(b, "Another Sentence");
let input = "Formats: 0xab01 0o127 101010 1Z";
let parsed = scanf!(input, "Formats: {x} {o} {b} {r36}", usize, i32, u8, u32);
let (a, b, c, d) = parsed.unwrap();
assert_eq!(a, 0xab01); // Hex
assert_eq!(b, 0o127); // Octal
assert_eq!(c, 0b101010); // Binary
assert_eq!(d, 71); // any radix (r36 = Radix 36)
assert_eq!(d, u32::from_str_radix("1Z", 36).unwrap());
The input in this case is a &'static str
, but in can be String
, &str
, &String
, …
Basically anything with AsRef<str>
and without taking Ownership.
The parsing part of this macro has very few limitations, since it replaces the {}
with a
Regular Expression (regex
) that corresponds to that type.
For example:
char
is just one Character (regex"."
)String
is any sequence of Characters (regex".+"
)- Numbers are any sequence of digits (regex
"[-+]?\d+"
)
And so on. The actual implementation for numbers tries to take the size of the Type into account and some other details, but that is the gist of the parsing.
This means that any sequence of replacements is possible as long as the Regex finds a
combination that works. In the char, usize, char, usize
example above it manages to assign
the N
and E
to the char
s because they cannot be matched by the usize
s.
Format Options
All Options are inside '{'
'}'
. Literal '{'
or '}'
inside of a Format Option are escaped
as '\{'
instead of '{{'
to avoid ambiguity.
Procedural macro don’t have any reliable type info and can only compare types by name. This means
that the number options below only work with a literal type like “i32
”, NO Paths ()
or Wrappers (std::i32
) or Aliases (struct Wrapper(i32);
). ONLY type Alias = i32;
i32
,
usize
, u16
, …
config | description | possible types |
---|---|---|
{/ <regex> /} | custom regex | any |
{x} | hexadecimal numbers | numbers |
{o} | octal numbers | numbers |
{b} | binary numbers | numbers |
{r2} - {r36} | radix 2 - radix 32 numbers | numbers |
{ <chrono format> } | chrono format | chrono types |
Custom Regex:
{/.../}
: Match according to theRegex
between the/
/
For example:
let input = "random Text";
let parsed = scanf!(input, "{/[^m]+/}{}", String, String);
// regex [^m]+ matches anything that isn't an 'm'
// => stops at the 'm' in 'random'
assert_eq!(parsed, Some((String::from("rando"), String::from("m Text"))));
As mentioned above, '{'
'}'
have to be escaped with a '\'
. This means that:
"{"
or"}"
would give a compiler error"\{"
or"\}"
lead to a"{"
or"}"
inside of the regex- curly brackets have a special meaning in regex as counted repetition
"\\{"
or"\\}"
would give a compiler error- first
'\'
escapes the second one, leaving just the brackets
- first
"\\\{"
or"\\\}"
lead to a"\{"
or"\}"
inside of the regex- the first
'\'
escapes the second one, leading to a literal'\'
in the regex. the third escapes the curly bracket as in the second case - needed in order to have the regex match an actual curly bracket
- the first
Works with non-String
types too:
let input = "Match 4 digits of 123456";
let parsed = scanf!(input, r"Match 4 digits of {/\d\{4\}/}{}", usize, usize);
// raw string (r"") to write \d instead of \\d
// regex \d{4} matches 4 digits
assert_eq!(parsed, Some((1234, 56)));
Note that changing the regex of a non-String
type might cause that type’s FromStr
to fail
Number Options:
Only work on primitive number types (u8
, …, u128
, i8
, …, i128
, usize
, isize
).
x
: hexadecimal Number (Digits 0-9 and A-F, optional Prefix0x
)o
: octal Number (Digits 0-7, optional Prefix0o
)b
: binary Number (Digits 0-1, optional Prefix0b
)r2
-r36
: any radix Number (no prefix)
chrono
integration (Requires chrono
feature):
The types DateTime
,
NaiveDate
,
NaiveTime
,
NaiveDateTime
,
Utc
and
Local
can be used and accept
a Date/Time format string
inside of the {
}
. This will then be used for both the Regex generation and parsing of the
type.
Using DateTime
returns a
DateTime<FixedOffset>
and requires the rules and limits that DateTime::parse_from_str
has.
use chrono::prelude::*;
let input = "10:37:02";
let parsed = scanf!(input, "{%H:%M:%S}", NaiveTime);
assert_eq!(parsed, Some(NaiveTime::from_hms(10, 37, 2)));
let expected = Utc.ymd(2020, 5, 23).and_hms(21, 5, 7);
// DateTime<*> directly implements FromStr and doesn't need a config
let input = "2020-05-23T21:05:07Z";
let parsed = scanf!(input, "{}", DateTime<Utc>);
assert_eq!(parsed, Some(expected));
let input = "Today is the 23. of May, 2020 at 09:05 pm and 7 seconds.";
let parsed = scanf!(input, "Today is the {%d. of %B, %Y at %I:%M %P and %-S} seconds.", Utc);
assert_eq!(parsed, Some(expected));
Note: The chrono
feature needs to be active for this to work, because chrono
is an optional dependency
Custom Types
scanf
works with most primitive Types from std
as well as String
by default. The
full list can be seen here: Implementations of RegexRepresentation
.
More Types can easily be added, as long as they implement FromStr
for the parsing
and RegexRepresentation
for scanf
to obtain the Regex of the Type:
struct TimeStamp {
year: usize, month: u8, day: u8,
hour: u8, minute: u8,
}
impl sscanf::RegexRepresentation for TimeStamp {
/// Matches "[year-month-day hour:minute]"
const REGEX: &'static str = r"\[\d\d\d\d-\d\d-\d\d \d\d:\d\d\]";
}
impl std::str::FromStr for TimeStamp {
// ...
}
let input = "[1518-10-08 23:51] Guard #751 begins shift";
let parsed = scanf!(input, "{} Guard #{} begins shift", TimeStamp, usize);
assert_eq!(parsed, Some((TimeStamp{
year: 1518, month: 10, day: 8,
hour: 23, minute: 51
}, 751)));
Implementing RegexRepresentation
isn’t strictly necessary if you always supply a custom
Regex when using the type by using the {/.../}
format option, but this tends to make your code
less readable.
A Note on Error Messages
Errors in the format string would ideally point to the exact position in the string that
caused the error. This is already the case if you compile/check with nightly, but not on
stable, or at least until Rust Issue #54725
is far enough to allow for this method
to be called from stable.
Error Messages on nightly currently look like this:
scanf!("", "Some Text {}{}{} and stuff", usize);
error: Missing Type for given '{}' Placeholder
|
4 | scanf!("", "Some Text {}{}{} and stuff", usize);
| ^^
But on stable, you are limited to only pointing at the entire format string:
error: Missing Type for given '{}' Placeholder:
At "Some Text {}{}{} and stuff"
^^
|
4 | scanf!("", "Some Text {}{}{} and stuff", usize);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The current workaround is to replicate that behavior in the Error Message
itself. The alternative is to use cargo +nightly check
to see the better Errors
whenever something goes wrong, or setting your Editor plugin to check with nightly.
This does not influence the functionality in any way. This Crate works entirely on stable with no drawbacks in functionality or performance. The only difference is the compiler errors that you get while writing format strings.
Macros
A Macro to parse a String based on a format-String, similar to sscanf in C
Same as scanf
, but returns the Regex without running it. Useful for Debugging or Efficiency.
Same as scanf
, but allows use of Regex in the format String.
Structs
A Wrapper around f32 whose RegexRepresentation also includes special floating point values
like nan
, inf
, 2.0e5
, …
A Wrapper around f64 whose RegexRepresentation also includes special floating point values
like nan
, inf
, 2.0e5
, …
Matches a Hexadecimal Number with optional 0x
prefix. Deprecated in favor of format options
Traits
A Trait used by scanf
to obtain the Regex of a Type